Methods and Applications for High-Frequency Biosignals Data

Lily Koffman

Department of Biostatistics, Johns Hopkins School of Public Health

Introduction: accelerometry data

Introduction: accelerometry data

Introduction: big accelerometry data

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Can we identify someone from their walking pattern measured by a wrist-worn accelerometer?

Problem setup

Problem setup

Problem setup

Big picture method: time series to scalar predictors

Details of the method

For each second and each person:

  • Obtain joint distribution of acceleration and lag acceleration for a series of lags

  • Calculate scalar summaries of the joint distribution

  • I will walk through the process for one second, one person, and one lag

  • Intuition: walking is cyclic process. We want to leverage cyclic nature of walking.

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Derive predictors from joint distribution

Derive predictors from joint distribution

Derive predictors from joint distribution

Derive predictors from joint distribution

Derive predictors from joint distribution

Repeat for multiple lags

Repeat for multiple seconds

Fingerprints summarize predictors for a given lag and are different across individuals

Ready to fit models

Results

Koffman et al. (2023)

Why not use all possible lags?

Why not use all possible lags?

Functional regression approach

Why not use all possible lags?

Functional regression approach

Why not use all possible lags?

Functional regression approach

Why not use all possible lags?

Functional regression approach

Functional regression approach

Toy example: 4 observations per second, 2 seconds, 1 individual

\(v_1(2)\): 2nd acceleration observation in second 1

data \[\begin{bmatrix} v_1(1) & v_1(2) & v_1(3) & v_1(4) \\ v_2(1) & v_2(2) & v_2(3) & v_2(4) \\ \end{bmatrix} \]

Functional regression approach

Toy example: 4 observations per second, 2 seconds, 1 individual

\(v_1(2)\): 2nd acceleration observation in second 1

data \[\begin{bmatrix} v_1(1) & v_1(2) & v_1(3) & v_1(4) \\ v_2(1) & v_2(2) & v_2(3) & v_2(4) \\ \end{bmatrix} \]

acceleration matrix \[\begin{bmatrix} v_1(2) & v_1(3) & v_1(4) & v_1(3) & v_1(4) & v_1(4) \\ v_2(2) & v_2(3) & v_2(4) & v_2(3) & v_2(4) & v_2(4) \\ \end{bmatrix} \] lag acceleration matrix \[\begin{bmatrix} v_1(1) & v_1(1) & v_1(1) & v_1(2) & v_1(2) & v_1(3) \\ v_2(1) & v_2(1) & v_2(1) & v_2(2) & v_2(2) & v_2(3) \\ \end{bmatrix} \]

lag matrix \[\begin{bmatrix} 1 & 2 & 3 & 1 & 2 & 1\\ 1 & 2 & 3 & 1 & 2 & 1\\\end{bmatrix} \]

Number columns: \(4 \cdot (4-1) / 2 = 6\)

Functional regression approach

Model outcomes as:

\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]

where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise

Functional regression approach

Model outcomes as:

\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]

where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise

Model:

\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]

Functional regression approach

Model outcomes as:

\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]

where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise

Model:

\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]

\(u = 1, \dots, S = 100\) (number of observations per second)

\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\)

\(F(\cdot, \cdot, \cdot)\): trivariate smooth function, takes values at every point in the domain of acceleration, lag acceleration, lags

Functional regression approach

Model outcomes as:

\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]

where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise

Model:

\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]

\(u = 1, \dots, S = 100\) (number of observations per second)

\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\)

\(F(\cdot, \cdot, \cdot)\): trivariate smooth function, takes values at every point in the domain of acceleration, lag acceleration, lags

Fit using penalized splines with a quadratic penalty on the functional coefficient (Wood 2016)

Functional regression approach: implementation

model = mgcv::gam(
  Y_mat ~ te(
    accel_mat,
    lag_accel_mat,
    lag_mat,
    k = c(5, 5, 5),
    by = weight_mat),
  family = binomial(),
  method = "REML"
)
  • \(\texttt{te()}\): tensor product smooth

  • \(\texttt{k = c(5, 5, 5)}\) number of basis functions for each dimension of the tensor product smooth

  • \(\texttt{weight\_mat}\): matrix of weights of linear functionals of smooth terms. We use equal weights so the \(i,j^{\mathrm{th}}\) entry is \(\texttt{1/ncol(accel\_mat)}\)

  • \(\texttt{method="REML"}\): smoothing parameter selection with restricted maximum likelihood

Connection between two methods

  • Functional approach is generalization of grid cell approach
  • Instead of discretizing space into cells, relationship between acceleration, lag acceleration, lag length is modeled as smooth, continuous surface \(F\)
  • Grid cell approach: piecewise constant approximation of \(F\)
  • Both leverage same underlying structure

Functional regression results

Rank-1 (rank-5) % accuracies

153 person dataset

3 min of walking seach

Two sessions at least 1 week apart

  • Train and test on session 1
    • Logistic regression: 92 (97)
    • Functional regression: 98 (100)
  • Train on session 1, test on session 2
    • Logistic regression: 41 (75)
    • Functional regression: 53 (69)

Why not use fancier models?

Rank-1 (rank-5) % accuracies

153 person dataset

3 min of walking seach

Two sessions at least 1 week apart

  • Train and test on session 1
    • Logistic regression: 92 (97)
    • Functional regression: 98 (100)
    • XGBoost: 93 (99)
  • Train on session 1, test on session 2
    • Logistic regression: 41 (75)
    • Functional regression: 53 (69)
    • XGBoost: 58 (78)

Koffman, Crainiceanu, and Leroux (2024)

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets when we don’t know when people are walking?
  • Can we accurately find walking and count steps in free-living datasets?
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Can we accurately find walking and count steps in free-living datasets?

Validation: identifying walking in free-living datasets

  • Track down free-living datasets with wrist-worn accelerometry and gold standard step counts
  • Find and implement open-source algorithms for step counting from wrist accelerometry
  • Evaluate performance of algorithms

Validation: identifying walking in free-living datasets

5 open-source algorithms, 3 datasets with gold-standard step counts

Koffman and Muschelli (2024)

Application: idenfiying walking and counting steps in NHANES

  • \(>15,000\) participants
  • \(7\) days of wrist accelerometry
  • \(10\)Tb of data
  • Over 1 year computation time
  • Open source pipeline
  • Open source data repository
  • First nationally representative estimate of steps in the US population using open source algorithms
Koffman and Muschelli (2025a)

Application: idenfiying walking and counting steps in NHANES

How many steps does the average American take per day?

Application: idenfiying walking and counting steps in NHANES

Do estimates differ by algorithm?

Application: idenfiying walking and counting steps in NHANES

Are more steps associated with lower mortality risk?

Koffman, Crainiceanu, and Muschelli (2024)

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Can we generalize conclusions from free-living accelerometry data to the US population?

Sex differences in steps?

Do males take more steps than females? At what points during the day?

Function on scalar regression

  • Outcome: steps profile over the course of the day (function)

  • Predictors: age, sex (scalars)

  • Model: \[\mathbb{E}[\mathrm{steps}_i(s)] = \beta_0(s) + \beta_1(s)\mathrm{sex}_i + \beta_2(s)\mathrm{age}_i \] \(i\): participant; \(s \in \{1, \dots, 1440\}\): each minute of the day

  • Interpretation

    • \(\beta_0(s)\): expected steps for reference sex category and age 0
    • \(\beta_1(s)\): change in steps for females compared to males, holding age constant
    • Can be interpreted at specific points of the day (e.g., how many more steps do males take at 12pm?)

Function on scalar regression: implementation

  • Fast univariate inference (FUI) (Cui et al. 2021)

  • Fit separate (univariate) GLM at each point \(s\), smooth the resulting point estimates

  • Bootstrap subjects to get confidence bands

  • BUT: NHANES is not a simple random sample

    • Individuals are sampled in geographic clusters

    • Minority groups are oversampled

  • Are our estimates valid for population-level inference?

Survey function on scalar regression

  • For standard regression, well developed methods and software exist that take into accounts weights and correlation between clusters (e.g. \(\texttt{svyglm}\), \(\texttt{svycoxph}\)) (Lumley 2010)

  • No such methods for function on scalar regression

  • FUI built on separate GLMS

  • Idea: incorporate survey weights into the GLMs and use survey-aware replication/bootstrap methods for inference

Survey function on scalar regression: simulation

First ever simulation study to evaluate function on scalar regression in complex survey settings

  • Generate superpopulation
    • Each individual has functional outcome and covariate
    • Individuals belong to clusters (geographic areas)
    • Functional outcomes correlated within clusters
  • Sample from superpopulation in each iteration of simulation
    • Selection probability related to functional outcome (informative sampling)
  • Compare:
    • Unweighted model w/ standard bootstrap
    • Weighted models with weighted bootstrap
    • Weighted models with complex survey bootstrap (2)
  • Evaluate:
    • Bias in coefficient estimation
    • Coverage of confidence intervals

Survey function on scalar regression: simulation

Koffman et al. (2025)

Survey function on scalar regression: software

model_fit = svyfosr::svyfui(steps_mat ~ age + sex,
                            weights = survey_weight,
                            data = steps_df,
                            family = gaussian(),
                            boot_type = "BRR",
                            num_boots = 500,
                            parallel = TRUE,
                            seed = 2213)
Koffman and Muschelli (2025b)

Survey function on scalar regression: application

Complex survey function on scalar regression: application

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
    \(\rightarrow\) Yes!
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
    \(\rightarrow\) Yes!
  • Can we apply these methods in other, non-accelerometry datasets?

Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?

Digital fingerprinting in NHANES

  • Use highly specific method to identify walking in NHANES (minimize false positives) (Karas et al. 2019)
  • \(N = 13{,}000\) individuals with 3 minutes walking per person
  • 3:1 train/test split
  • Logistic regression + weighting to overcome class imbalance
  • 43% rank-1 accuracy
  • 73% rank-5 accuracy
  • 97% rank-1% accuracy (correct subject is in the top 130 predictions)
  • 100% rank-5% accuracy (correct subject is in the top 650 predictions)

Digital fingerprinting in NHANES

Koffman, Muschelli, and Crainiceanu (2025)

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
    \(\rightarrow\) Yes!
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
    \(\rightarrow\) Yes!
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
    \(\rightarrow\) Yes!
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
    \(\rightarrow\) Yes!
  • Can we apply these methods in other, non-accelerometry datasets?

Can we apply these methods in other, non-accelerometry datasets?

Arterial waveform

Arterial waveform

Arterial waveform

Fingerprinting with arterial waveform

Fingerprinting with arterial waveform

  • Obtain predictors for many different lags and cut points
  • Use predictors that are top 10 contributors to first 30 PCs (\(\approx 100\) predictors)
  • Fit XGBoost model on 727 patients
  • Mean (SD) \(7 (1.8)\) minutes per patient, range \(3\)-\(16\) minutes

Fingerprinting with arterial waveform

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
    \(\rightarrow\) Yes!
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
    \(\rightarrow\) Yes!
  • Can we apply these methods in other, non-accelerometry datasets?
    \(\rightarrow\) Yes!

Future Directions

  • Using changes in fingerprint (both walking and waveform) to predict changes in function
  • Designing real-time interventions based on hemodynamics patterns
  • Extending survey FoSR to longitudinal outcomes
  • Standardizing processing and analysis pipelines for wearable accelerometry

Thank you!


References

Cui, Erjia, Andrew Leroux, Ekaterina Smirnova, and Ciprian M. Crainiceanu. 2021. “Fast Univariate Inference for Longitudinal Functional Models.” Journal of Computational and Graphical Statistics 31 (1): 219–30. https://doi.org/10.1080/10618600.2021.1950006.
Karas, Marta, Marcin Stra̧czkiewicz, William Fadel, Jaroslaw Harezlak, Ciprian M Crainiceanu, and Jacek K Urbanek. 2019. “Adaptive Empirical Pattern Transformation (ADEPT) with Application to Walking Stride Segmentation.” Biostatistics 22 (2): 331–47. https://doi.org/10.1093/biostatistics/kxz033.
Koffman, Lily, Ciprian Crainiceanu, and Andrew Leroux. 2024. “Walking Fingerprinting.” Journal of the Royal Statistical Society Series C: Applied Statistics 73 (5): 1221–41. https://doi.org/10.1093/jrsssc/qlae033.
Koffman, Lily, Ciprian Crainiceanu, and John Muschelli. 2024. “Comparing Step Counting Algorithms for High-Resolution Wrist Accelerometry Data in NHANES 2011–2014.” Medicine & Science in Sports & Exercise 57 (4): 746–55. https://doi.org/10.1249/mss.0000000000003616.
Koffman, Lily, Sunan Gao, Xinkai Zhou, Andrew Leroux, Ciprian Crainiceanu, and John Muschelli III. 2025. “Function on Scalar Regression with Complex Survey Designs.” https://arxiv.org/abs/2511.05487.
Koffman, Lily, and John Muschelli. 2024. “Evaluating Step Counting Algorithms on Subsecond Wrist-Worn Accelerometry: A Comparison Using Publicly Available Data Sets.” Journal for the Measurement of Physical Behaviour 7 (1). https://doi.org/10.1123/jmpb.2024-0008.
———. 2025a. “Minute Level Step Counts and Physical Activity Data from the National Health and Nutrition Examination Survey (NHANES) 2011-2014.” PhysioNet. https://doi.org/10.13026/9N0R-TV02.
———. 2025b. Svyfosr: Survey-Weighted Function on Scalar Regression. https://github.com/jhuwit/svyfosr.
Koffman, Lily, John Muschelli, and Ciprian Crainiceanu. 2025. “Walking Fingerprinting Using Wrist Accelerometry During Activities of Daily Living in NHANES.” https://arxiv.org/abs/2506.17160.
Koffman, Lily, Yan Zhang, Jaroslaw Harezlak, Ciprian Crainiceanu, and Andrew Leroux. 2023. “Fingerprinting Walking Using Wrist-Worn Accelerometers.” Gait & Posture 103 (June): 92–98. https://doi.org/10.1016/j.gaitpost.2023.05.001.
Lumley, Thomas. 2010. Complex Surveys: A Guide to Analysis Using r: A Guide to Analysis Using r. John Wiley; Sons.
Wood, Simon N. 2016. “P-Splines with Derivative Based Penalties and Tensor Product Smoothing of Unevenly Distributed Data.” Statistics and Computing 27 (4): 985–89. https://doi.org/10.1007/s11222-016-9666-x.